unseen label
GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning
Liu, Ziming, Guo, Jingcai, Lu, Xiaocheng, Guo, Song, Dong, Peiran, Zhang, Jiewei
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein, the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics, and transfer the learned model to unseen ones. But they ignore the effective integration of local and global features. That is, in the process of inferring unseen classes, global features represent the principal direction of the image in the feature space, while local features should maintain uniqueness within a certain range. This integrated neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Specifically, we split the feature maps into several feature groups, of which each feature group can be trained independently with the Local Information Distinguishing Module (LID) to ensure uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to preserve the principal direction. Besides, a static graph structure is designed to construct the correlation of local features. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
- Asia > China > Hong Kong (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP
Esmaeilpour, Sepideh, Liu, Bing, Robertson, Eric, Shu, Lei
In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin.
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Creating Training Sets via Weak Indirect Supervision
Zhang, Jieyu, Wang, Bohan, Song, Xiangchen, Wang, Yujing, Yang, Yaming, Bai, Jing, Ratner, Alexander
Creating labeled training sets has become one of the major roadblocks in machine learning. To address this, recent Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources. However, existing frameworks are restricted to supervision sources that share the same output space as the target task. To extend the scope of usable sources, we formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels based on indirect supervision sources that have different output label spaces. To overcome the challenge of mismatched output spaces, we develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources. Moreover, we provide a theoretically-principled test of the distinguishability of PLRM for unseen labels, along with an generalization bound. On both image and text classification tasks as well as an industrial advertising application, we demonstrate the advantages of PLRM by outperforming baselines by a margin of 2%-9%.
- North America > United States > New York > New York County > New York City (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (6 more...)
Noisy Channel Language Model Prompting for Few-Shot Text Classification
Min, Sewon, Lewis, Mike, Hajishirzi, Hannaneh, Zettlemoyer, Luke
We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive models (e.g., direct head tuning): channel prompt tuning is preferred when the number of training examples is small, labels in the training data are imbalanced, or generalization to unseen labels is required.
- Asia > Middle East > Jordan (0.04)
- South America > Peru (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (5 more...)
- Leisure & Entertainment > Sports (0.93)
- Education (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Zero-shot Multi-Domain Dialog State Tracking Using Descriptive Rules
Altszyler, Edgar, Brusco, Pablo, Basiou, Nikoletta, Byrnes, John, Vergyri, Dimitra
In this work, we present a framework for incorporating descriptive logical rules in state-of-the-art neural networks, enabling them to learn how to handle unseen labels without the introduction of any new training data. The rules are integrated into existing networks without modifying their architecture, through an additional term in the network's loss function that penalizes states of the network that do not obey the designed rules. As a case of study, the framework is applied to an existing neural-based Dialog State Tracker. Our experiments demonstrate that the inclusion of logical rules allows the prediction of unseen labels, without deteriorating the predictive capacity of the original system.
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > United States (0.04)
AI classifies songs from genres it has never heard before
Even casual music fans can distinguish songs by category without great difficulty, but that's not the case for computers. Most audio-based music classification and tagging systems use categorical supervised learning -- in other words, learning a function that maps songs to genres based on example pairs -- with a fixed set of labels that intrinsically can't handle unseen labels, such as newly added genres. That's why a team of scientists at Naver Corp, an internet content service company headquartered in South Korea, investigated a zero-shot alternative in a paper ("Zero-Shot Learning for Audio-based Music Classification and Tagging") published on the preprint server Arxiv.org. Their AI classification system learns how to recognize songs without any labeled training data by taking into account side information about musical instruments, words in descriptions about songs, and more. The researchers settled on two types of side information at the outset of the study: human-labeled attribute information and general word semantic information.
- Media > Music (0.38)
- Leisure & Entertainment (0.38)
Zero Shot Learning with the Isoperimetric Loss
Deutsch, Shay, Bertozzi, Andrea, Soatto, Stefano
We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pre-trained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure defined by the sample points to regularize the estimates of the manifolds by inferring the graph connectivity using a generalization of the isoperimetric inequalities from Riemannian geometry to graphs. Surprisingly, this regularization alone, paired with the simplest baseline model, outperforms the state-of-the-art among fully automated methods in zero-shot learning benchmarks such as AwA and CUB. This improvement is achieved solely by learning the structure of the underlying spaces by imposing regularity.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
All-in Text: Learning Document, Label, and Word Representations Jointly
Nam, Jinseok (Technische Universität Darmstadt) | Mencía, Eneldo Loza (Technische Universität Darmstadt) | Fürnkranz, Johannes (Technische Universität Darmstadt)
Conventional multi-label classification algorithms treat the target labels of the classification task as mere symbols that are void of an inherent semantics. However, in many cases textual descriptions of these labels are available or can be easily constructed from public document sources such as Wikipedia. In this paper, we investigate an approach for embedding documents and labels into a joint space while sharing word representations between documents and labels. For finding such embeddings, we rely on the text of documents as well as descriptions for the labels. The use of such label descriptions not only lets us expect an increased performance on conventional multi-label text classification tasks, but can also be used to make predictions for labels that have not been seen during the training phase. The potential of our method is demonstrated on the multi-label classification task of assigning keywords from the Medical Subject Headings (MeSH) to publications in biomedical research, both in a conventional and in a zero-shot learning setting.